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I Abstract 

In finite mixtures of location-scale distributions, if there is no constraint or 
penalty on the parameters, then the maximum likelihood estimator does not exist 
I because the likelihood is unbounded. To avoid this problem, we consider a penalized 

■ likelihood, where the penalty is a function of the minimum of the ratios of the scale 

OO . parameters and the sample size. It is shown that the penalized maximum likelihood 

estimator is strongly consistent. We also analyze the consistency of a penalized 
maximum likelihood estimator where the penalty is imposed on the scale parameters 
themselves. 
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1 Introduction 

In this paper, we prove the strong consistency of a penalized maximum likelihood estimate 
for finite mixtures of univa r iate l ocation-scale distributions generalizing the results in 
Ciuperca. Ridolfi. and Idiei] tooA As a special case of this result, we solve an open 



problem posed by [H athawa j ( 1985 ). 



As stated in iDayi (.1969, ), because the likelihood function for finite mixtures of location- 
scale distributions is unbounded, the maximum likelihood estimator does not exist. To 
see that, we consider a simple case that the model consists of mixtures of two normal 
distributions q;i0(x; /ii, o"i) + a20(a:^; /^2, C2) and assume that we obtain an i.i.d. sample 
Xi, X2, . . . , Xn from the true distribution. If we set /ii = Xi and ai — > 0, then the 
likelihood tends to infinity as ai goes to zero. Hence the likelihood function is unbounded. 
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A straightforward approach to this problem is to bound the minimum of the v ariances 
of the components from below by a positive constant. By using theorem 6 in iRedner 
( ll98ll ). we can show that the maximum likelihood estimator restricted to a compact subset 
of the parameter space is strongly consistent if the subset contains the true parameter. 

Another approach is penalized maximum likelihood estima tion. However, if the penalty 
i s not appropriate, then the likelihood function is unbounded. ICiuperca. Ridolfi. and Idler 
(I2OO3I ) considered the case that the penalties are imposed on the variances themselves and 
pr oved the consistency of the p e nalize d maximum likelihood estimator. The results given 



m 



Ciuperca. Ridolfi. and Idien (120031 ) are very useful for estimating the parameters of 



mixture of normal densities because the assumptions for the penalty are easy to check 
and the implementation of their method is also easy. In this paper, we extend their con- 
sistency result to the case that the components of mixtures are not normal densities and 
the penalty depends on the sample size n. 



In normal mixture distributions, iHathawayl (119851 ) considered the following constraints 
to avoid the divergence of the likelihood. 



mm > 

m,m' (Tm' 



;i.o.i) 



This bounds the minimum of the ratios of the variances of the components by a constant. 
He showed that the strong consistency of the maximum likelihood estimator holds if the 
true distribution satisfies the constraint represented by equation (ll.O.ip . Intuitively, a 
stronger constraint must be enforced for a smaller sample size to avoid the divergence 
of the likelihood, because a component with a very small variance can only have a large 
contribution to a single observation at most. Therefore, it seems that the constraint under 
which the consistency holds can be weakened as the sample size increases. This intuition 
leads to the following two questions: 

• Is it possible to let b decrease to zero as the sample size n increases to infinity while 
maintaining consistency? 

• If it is possible, then at what rate can b be decreased to zero? 

These questions are mentioned in Hathawayl ( 1985 ). McLachlan and Peel ( 2000l ). and 
treated as unsolved problems. 



This topic is closely related to a sieve method. (See lGrenanderl (jl98ll ) and lGeman and Hwang 
( ll982l ). ) For normal mixture distributions, the conve rgence rate of the maximum like- 



lihood estimator based on sieve method is studied in Genovese and Wasserman (l2000h 
and iGhosal and van der VaartI (120011 ) . In iTanaka and Takemural (120061 ) , for mixtures of 
location-scale distributions, we showed the strong consistency of the maximum likelihood 
estimator in the case that the scale parameters themselves are bounded from below by 
Cn = e~" , (0 < d < 1). However, we could not solve the original questions when con- 
straints are imposed on the minimum of the ratios of the variances of the components. 

In this paper, we solve the questions treated above in a more general and unified 
framework. For mixtures of location-scale distributions, we consider a penalized likeli- 
hood, where the penalty is a function of the minimum of the ratios of the scale parameters 
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and the sample size n. The effect of the penalty becomes stronger as the minimum of the 
ratios of the scale parameters decreases to zero. Note that the penalty can depend on 
the sample size n. We can weaken the effect of the penalty as the sample size n increases 
to infinity. In Theorem [T], we show that the consistency holds for the penalized maxi- 
mum likel i hood e stimator. In Corollary [H t he solutions to the questions mentioned in 
Hathawavl fll985h . lMcLachlan and Peel (120001 ) are obtained as special cases of Theorem [H 



We also analyze the consistency of a penalized maximum likelihood estimator in which 
the penalties are imposed on the scale param eters themselves. The resul t obta ined in 



Theorem [2] is a generalization of Corollary 1 of ICiuperca. Ridolfi. and Idien (120031 ) . 

Throughout this paper, we assume that the true distribution is a mixture of location- 
scale distributions and the number of components of the true distribution is known. 

The organization of this paper is as follows. Section [2] describes notation and regularity 
conditions. The main results are stated in section [31 Section H] is devoted to the proofs. 
We end this paper by concluding remarks in section [51 

2 Preliminaries 
2.1 Notation 

Mixture of M location-scale densities are written in the form 

M 



f{x; 9) = ^ amfm{x; fJ-m, CTm)- 



m=l 



The mixing weights ai,...,aM have to satisfy am > 0, Ylm=i^m = 1- We assume 
that the components /ii, (Xi), . . . , fM^x; fiM, ctm) are location-scale densities i.e. they 
satisfy 

fm{x; iJ,m,<Jm) = — fm{- — 0, 1) , 1 < m < M, 

where /x^ and 0"^ are location parameters and scale parameters respectively. We abbrevi- 
ate («!, yUi, (Ji, . . . , ajvf, CTm) as 6, and (yUm, (7^) as 6^- We denote the true parameter 
by ^^0- 

Let VLm = {(/^m, o"m) | /^m G M, (Jm, G (0, oo)} dcuote the parameter space of the m-th 
component. Then the entire parameter space can be represented as 

M M 



6 = {(tti, . . . , am) I ^ = 1, a™ > 0} X J]^ fi^. 

m=l m=l 

For a given sample X = (Xi, . . . , X„) from /(x; 6q), the likelihood function is defined 

n n f M 

1(9; X) = l[ /(X,; = n 1 5Z amfmiXi-, fim, a„ 



i=l 1=1 ym=l 
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Throughout this paper, we fix M, the number of components of mixture models. Let 
C. {f{x;9) I 9 G 0} denote the set of location-scale mixture densities which consist of 
no more than m components. For example, if a mixture density satisfies ctm+i = ■ ■ ■ = 
ttM = 0, then the density belongs to Note that = {f{x] 9) \ 9 E 6}. 

Let and cr(M) denote the minimum and the maximum values of the scale parame- 
ters: 

an) = min (x^ , ann = max am ■ (2.1.1) 

^ ' l<rn<M ^ ' l<m<M 

Let {cn}'^=i and {&n}5^Li denote sequences of positive reals which converge to zero. In our 
discussion, we use two constrained parameter space. Define 9c„, Ob„ as follows: 

= {9ee\ > c4, e,„ = g e | ^ > 54. 



a. 



(M) 



2.2 Regularity conditions 



We introduce assumptions for the strong consistency o f the maximu r n likelih o od es tima- 
tor. These assumptions are essentially the same as in IWaldl (119491 ) , iRednerl (Il98ll ) and 



Tanaka and Takemural (120061 ). 



Let r denote any compact subset of 9. 
Assumption 1. For 9 (z Q and any positive real number p, let 

f{x;9,p) = sup fix;9'), 

dist{e',e)<p 

where dist (6'',6') is a distance between 9 and 9'. For each 9 eT and sufficiently small p, 
f{x;9,p) is measurable. 



Assumption 2. For each 9 eV, if lim^^oo ^^^'^ = 9, {9^^^ E F) then lim^^oo /(x; ^^^'^^ 
f{x; 9) except on a set which is a null set and does not depend on the sequence {9^^'>} 

Assumption 3. 

|log /(x; 6*0) I /(x; 9q)<1x < 00. 



00 



Furthermore, in Section [3l we impose Assumption |4] or [5] according to what type of 
penalty is made. If the penalty is imposed on the scale parameters themselves, then we 
impose Assumption SI Alternatively, if the penalty is imposed on the ratios of the scale 
parameters, then we impose Assumption [51 

Assumption 4. There exist real constants fo, f 1 > and [3 > 1 such that 
fm{x; Pm = 0, = 1) < min{t;o , f 1 ■ \x\~^} 

for all m. 
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Assumption 5. There exist real constants fo, f i > and (3 > 2 such that 
fm{x; fJ^m = 0, a„ = 1) < mm{vo , Vi ■ \x\~^} 

for all m. 

Note that Assumption [S] is stronger than AssumptionHl Therefore, if Assumpt ion [T][2][5] 
and [5] hold, then Assumption [T][2][3] and H] hold. 

2.3 Strong Consistency 

According to iRednei] fll98lh . we define strong consistency of estimators for mixture dis- 



tributions by identifying the parameters whose densities are equal. Let 

e(^') = {6 ee\ f{x; 6) = f{x; 6') almost everywhere on R} 

Furthermore we abbreviate 0(6'o) as Gq. Given U,V <Z Q, the distance between U and V 
are defined as 

dist(U,V) = inf inf dist(^,^')- 
We now define strong consistency of estimators for mixture distributions as follows. 
Definition 1. An estimator On is strongly consistent iff 

Prob ( lim dist(e(^„), Bq) = o) = 1. 

In this paper, two notations "Prob(A) = 1" and "A, a.s." {A holds almost surely), will 
be used interchangeably. 

3 Main results 

3.1 Consistency of penalized maximum likelihood estimator when 
the penalty is imposed on the minimum of the ratios of the 
scale parameters 

Now we define a penalized likelihood. Let r„(-) denote a function on (0, 1] which satisfies 
the following assumption and is not identically equal to zero. 

Assumption 6. 3_R < oo, 3f > 0, 35 > 0, < 3(i < 1 such that 

< fn{y) < min {R, f ■ y^'^^ ■ exp {n'^)}. 
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The Assumption [6] means that f„(?/) is nonnegateve, bounded in n and ?/, and converges 
to zero sufficiently fast as y tends to zero. Note that we can take a discontinuous function 
as fn{y)- In Corollary [H, we obtain the consistency of a constrained maximum likelihood 
estimator by using a discontinuous penalty function. 

We define a penalty function l/r„(^) or a reward function r„(6') as 

r„(«) ^ f,. ( ^) . 

The penalized likelihood function is defined as gn{d', X) = l{6; X) ■ rn{0). The penalized 
maximum likelihood estimator is defined as 9g^ = aigsnpg^^Qgnid', X). As stated in Sec- 
tion [21 the likelihood l{9; X) may increase to infinity as am decreases to zero. However, if 
the penalty l/r„(^) increases to infinity or r„(^) decreases to zero, the divergence of the 
likelihood may be avoided. This happens when a part of the scale parameters decreases 
to zero. If all the scale parameters decreases to zero, then the likelihood l{6] X) decreases 
to zero because a component with a very small scale parameter can only have a large 
contribution to a single observation at most. Therefore, the existence of the penalty term 
may prevent the positive divergence of the likelihood. 
Let 60 > 0. In this section, we take 6„ as follows: 

6„ = 60 ■ exp {-n'^) 
We also assume the following conditions. 

Assumption 7. There exist a positive real reo and a positive integer N such that rn{0) > 
reo for all 6 E Qq and n > N. 

If fn{y) is positive and unimodal or increasing, then r„(6') satisfies Assumption [3 
Assumption [7] guarantees that the penalized likelihood is nearly unaffected by the penalty 
term for ^ G Oq when sample size n is large. 

Assumption 8. There exist real constants d, cq and A such that < d < d < 1, Cq > 0, 
A > and the following relation holds for all 9 G 0c„ and n G N : 

where Cn = Cq ■ exp (— ra"') and Qc„ = {0 E Q \ (T(i) > c„}. 

Assumption [8] means that all the scale parameters of ^ G 0c are equally small if 

rnie) > (a(i))*^. 

The assumptions for the penalties are not so restrictive. For example, if we set r„(y) = 
f ■ ?/"~^ ■ e"'' and assume a > M + 1, then f„(?/) satisfies the Assumption [6] and r„(6') = 
f„(^^) satisfies the Assumption [7] and [H 

Then the following theorem holds. 
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Theorem 1. Suppose that satisfies the Assumption [7p|[3l and\^ and f{x; Oq) G \ 
^M~\- Suppose that the penalty function rn{0) satisfies the Assumptions^ andl^ Then 
the penalized maximum likelihood estimator 9g^ is strongly consistent. 

A proof of Theorem [1] is given in section I4.2[ 

As a corollary of Theorem [T|, we can obtain the consistency of a constrained maxi- 
mum likelihood estimator. Let us define the constrained maximum likelihood estimator 
restricted to G^^ as 

K = argsupege6„^(^;^)- 

If we put fn{y) and r„(6') as 

fM = l' ^^-^'^^ , Tnie) = fj ^) = I' M-iM)>bn)^ ^3^^_^^ 

[0 {y<bn) \^{M)J [0 ((T(i)/cr(M) < &n) 

then is equal to the penalized maximum likelihood estimator 6g^ = aigsupg^QgniO; X) = 
argsupgg@/(6'; X) ■ r„(6'). If we take < d < 1, then r„(^) given in (13.1.11) satisfies As- 
sumption [6l [7] and [HI From this and Theorem [1], we obtain the following corollary. 

Corollary 1. Suppose that^^M satisfies the Assumption ITHMM andl^ and f{x]9o) G 
'^M \ ^A/-i- If we take < d < 1, then the constrained maximum likelihood estimator Of,^ 
is strongly consistent. 



By Corollary [H the problem stated in iHathawayl (119851 ) is solved positively. 



3.2 Consistency of penalized maximum likelihood estimator when 
the penalties are imposed on the scale parameters them- 
selves 

We also give a consistency result for the penalized maximum likelihood estimator in which 
the penalties are imposed on the scale parameters themselves. Let s„(-) denote a function 
on (0, oo) which satisfies the following assumptions. 

Assumption 9. Sn{y) is nonnegative, uniformly bounded and not identically equal to 
zero: 

0<Sn{y)<S<oo , sups,,(y)>0. 

y>0 

Assumption 10. Sn{y) converges to zero sufficiently fast as y tends to zero: 

3s>0, 0<3rf<l s.t. < sup ^4t^ < s ■ exp (n*^) 

y>o y^^ 
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Then we define a penalty function l/sn{0) or reward function Sn{0) as follows: 

M 

sn{e) ^ n 



M 



m=l 



The penalized likelihood function is defined as hn{d'-, X) = l{9] X) ■ Sn{0). The penalized 
maximum likelihood estimator is defined as 9h„ = argsup5ig0/i„(^; X). 
We also assume the following condition. 

Assumption 11. There exist a positive real ssq and a positive integer N such that Sn{0) > 
se,) for all 6* G 60 and n > N. 

The assumptions for the penalty are not so restrictive. For example, if we set s„(?/) = 
e~y ■ and assume a,P > 0, then Sn{y) satisfies the Assumption [9] and [10], and 

Sn{0) = Y[m=i Sn{o'm) satisfics the Assumption [TTl 

We now state the consistency of the penalized maximum likelihood estimator when 
the penalty is imposed on the scale parameters themselves. 

Theorem 2. Suppose that satisfies the Assumption [7U^|[gl an(i[2], and f{x; Oq) G ^a/ \ 
Suppose that the penalty function Sn{0) satisfies the Assumption lWTU andUli Then 
the penalized maximum likelihood estimator 9h„ is strongly consistent. 



The statement of Theorem [2] is an extension of Corollary 1 of lCiuperca. Ridolfi. and Idler 



(120031 ). In their statement, penalties for the location parameters /zi, ■ ■ ■ ,/iM may be re- 
quired. This is because, in their proof, they use a compactification of the parameter space, 
but their penalized likelihood is not continuous over the compactified parameter space. 
For example, if /ii 00, then other components may still exist and hence their penalized 
likelihood may not tend to zero. 

We give a proof of Theorem [2] in section 14.31 



4 Proofs 

In this section, we prove Theorem [1] and [2l The organization of this section is as follows. 
In section 14. we state some lemmas needed for proving Theorem [1] and [2l Section 14.21 
and 14.31 are devoted to the proof of Theorem [1] and [2] respectively. 

4.1 Some lemmas 

We state some lemmas needed for proving Theorem [T] and [2J Proofs of Lemma 14.1.11 



14.1.21 1^1.51 14.1.6[ H71.7I and 14. 1.81 are given in the longer version of iTanaka and Takemura 



mm 



In iTanaka and Takemural (120061 ) . we showed that when the constraint is appropri- 
ately imposed on the minimum of the scale parameters, the constrained maximum like- 
lihood estimator is strongly consistent under regularity conditions. Let us define the 
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constrained maximum likelihood estimator restricted to 0c„ = {9 E Q \ cr(i) > c„} by 
9c„ = argsupeg0^J(^;X). 

Lemma 4.1.1. ( Tanaka and Takemural ( 20061 )) Suppose that satisfies the Assump- 
tion [TPP and and f{x;6o) e \ ^m-\- Let Cq > and < d < 1. If Cn = 
Co ■ exp(— n'^), then the constrained maximum likelihood estimator 6c„ restricted to Gc„ is 
strongly consistent. 

As in the case of uniform mixture in lTanaka and Takemural (|2005l ). it is readily verified 
that if bn decreases to zero faster than e"", then the consistency of the constrained max- 
imum likelihood estimator fails. Therefore, the rate obtained in Lemma 14.1.11 is almost 
the lower bound of 6„ which maintains the strong consistency. 

Let 

Xn,i = min {Xi, . . . , X^} , = max {Xi, . . . , X„}. 

Lemma 4.1.2. ( Tanaka and Takemura ( 20061 )) Suppose that Assumption^^is satisfied. 
For any real positive constants Aq > OX > 0, let 

2+C 

An = Ao-n0-^, (4.1.1) 
where (3 is defined by Assumptions^ Then 

Prob {Xn,i < -An or > A^ i.e.) = 0. 

where i.e. means "infinitely often". By Lemma r4.1.2[ we can bound the behavior of the 
minimum and the maximum of the sample with probability 1. In the following sections, we 
take Aq large enough to satisfy (14.2.161) and ignore the event {Xn^i < —An or Xn,n > An}. 

Let Rn{V) denote the number of observation which belong to a set ^ C M: 

Rn{V) = ^{X,\X,eV, 2 = l,...,n}. 
Let Poiy) denote the probability of C M under the true density: 

Po{V)= [ f{x;do)dx. 



Let us consider an interval [/i — /i + m„] with the center /i and the length 2m„. If 
Wn = 0, then Rn{[fi — Wn, + Wn]) is clearly 0. In the following lemma, we state that if 
Wn decreases to zero faster than a power of 1/n, then _R„([/i — /x + w„]) < 2 holds for 
every yU G M with probability 1. 

Lemma 4.1.3. Suppose that Assumption\^is satisfied. Let {wn}'^=i be a sequence of real 
numbers which satisfies 

lim n^+^' ■An-Wn = 0, (4.1.2) 

n— >oo 

where S' > and An is defined by Then 

Prob I sup -R„([/U — Wn, /X + Wn]) > 1 i.o. \ = 0. 
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Proof: From Lemma [4.1.21 we ignore the event {Xn,i < —An or Xn,n > An}. Then 

sup Rniifi - Wn, fi + Wn]) > I SUp i?„ ( [/i - /i + W„] ) > 1 a.S. 

AtSM fi&[ — An+Wn,An—W„] 

(4.1.3) 

Now we cover [— An] by short intervals of length 4w„. Let 

= [-An, -An + 4Wn], = [-An + 2Wn, -An + 6w„] , . . . , 

4!-l = [-^n + {kn - 6) ■ Wn, -An + (fc„ - 2) • Wn], 
4;;) = [-An + {kn-4)-Wn, An], 

where fc„ = min{fc G N | A; ■ (2w„) > 2A„}. See Figure [H Since any half-open interval of 

jin) An) 

1 3 



An) An) An) 

^2 ^4 



i^igure i. ,1^ 



length 2wn in [— An\ is covered by one of . . . , I^^^ , the following relation holds. 

sup Rn{\ll -Wn,fi + Wn]) > 1 I < < kn , > 1 (4.1.4) 

tJ.&[-A„+W„,A„-Wn] 

Let Mo = snp^ f{x;6o). Because Rn{lj^^) ~ Bin(n, Po(-^fc"^)) and Po(-^fc"'') < 2w„mo, we 
obtain 

Prob (l < 3A; < , i?„(/f^) > l) < ^Prob (i?„(/f^) > 1 

k=l 

< kn - \ max Prob(i?„(4"^) > 1) 



< 



- + ■ E (^) (2^n^o)'(l - 2ti;„no)"- 



< ( — + 1 ) ■ (2nw„Mo)^ ■ exp {2nWnUo) . (4.1.5) 
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From fl4.1.2p . when we sum the right hand side of (14.1.51) over n, the resulting series 
converges. Hence by fl4.1.3p . fl4.1.4p . fl4.1.5p and Borel-CanteUi lemma, we have 

Prob I sup -R„([/i — Wn, /i + Wn]) > 1 i-o. ) = 0. 

□ 



Vq 







fm 5 l^m 5 f^m ) 









Figure 2: Each component is bounded by a step function. 



Next we bound the component densities from above. For (3 > 2, define z/(o") as 



a . 



(4.1.6) 



Let 1(7 (x) denote the indicator function of U C M. 

Lemma 4.1.4. Suppose that Assumption \E is satisfied. Then the following inequalities 
hold. 



(1) 



vo(y{M)} , 1 < m < M. (4.1.7) 



Proof: From Assumption [5l each component is bounded from above as 

fm{x;Hm,(rm) < max{ 1 (x) • — , VoCTm}. 



See figure [21 From this and (12.1.11) . we obtain (I4.1.7p . 

Let Eq[-] denote the expectation under the true parameter 9q. 



□ 



Lemma 4.1.5. ( Tanaka and Takemura ( 20061 )) Suppose that satisfies the Assumption 
USMM and\^ and f{x; 9q) G '^m\Sm-\- Then there exist real constants k, X > such that 



Eo [log {fix- 9) + k}] + X< Eo[logf{x; 9o)] 



(4.1.8) 



for all f{x]9)E% 



M-l- 
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Fix arbitrary kq > 0, which corresponds to k in Lemma [4.1 .51 For (3 > 1, define z/((t) 
as ^ 



In a manner similar to the proof of Lemma I4.1.4[ we can show the following lemma. 



Lemma 4.1.6. ( Tanaka and Takemura ( 20061 )) Suppose that Assumption^is satisfied. 



Then the following inequality holds. 

fm{x;fIm,Crm) < max{ 1 (x) ■ — , Ko}. 

Lemma 14.1.61 bounds the tails of a density in a different way than in Lemma I4.1.4[ 
On the one hand, in Lemma 14.1.41 the tails of a density is bounded by the value of 
scale parameter and Assumption [5] is needed because P should be larger than 2. On the 
other hand, in Lemma 14.1.61 the tails of a density is bounded by a constant and only 
Assumption mis needed. Lemma [4. 1.41 will be used to prove Theorem [Tl Lemma [4.1.61 will 
be used to prove Theorem [1] and Theorem [2l Therefore, Theorem [1] needs Assumption [5] 
which is stronger than Assumption HI 

Let ^ be a subset of {1,2,..., M} and let denote the number of elements in 
J^. Denote by a subvector of ^ G 6 consisting of the components in J^. Then the 
parameter space of subprobability measures consisting of the components in is 



e^ = {ej^\eee,J2(^m< i}. 



Corresponding density and the set of subprobability densities are denoted by 

fx{x]ejf) = ^ akfkix;iJ,k,ak), 

^^ = {/^(a;;e^)|e,^Ge^}. 

Then '^^i the set of subprobability densities with no more than K components, can be 
represented as 

^K= U {1<K<M). 

\X\<K 

The following lemma follows from the bounded convergence theorem. 



Lemma 4.1.7. ( Tanaka and Takemura ( 20061 )) Let Y x denote any compact subset of 



Qx- For any real constant Kq>Q and any point Qx ^ ^x, the following equality holds 
under Assumption\J\ and\^ 

lim Eo[log{fxix; 9x, p) + i^q]] = ^o[log{/^(x; Ox) + kq}] . 
The following lemma follows from lemma 14.1.71 
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Lemma 4.1.8. (iTanaka and Takemural (120061 )) Let kq and Aq he real constants which 
corresponds to k and A in Lemma \4-i-5[ Let Tx denote any compact subset of Qjf- 



Let ^{6^, p(6x)) denote the open ball with center 6^ and radius piO^). Suppose that 
AssumptionUl and0 hold. Then can be covered by a finite number of balls 

M{e^^,p{e^^)), ^{e%\ pie^p)) such that 

Eo[\og {fj^ix; ^5, p(^5)) + no}] + Aq < Eo[\og /(x; ^o)] , {s = l,...,S). 
4.2 Proof of Theorem [1] 

First, we partition the parameter space into two sets. Then the proof of the strong con- 
sistency of the penahzed maximum hkehhood estimator is also partitioned into two parts. 
The proof for one set is obtained immediately by applying the result of Lemma 14.1.11 

4.2.1 Partitioning the parameter space 

Let d be a constant defined by Assumption O Let be a constant defined by Assumption [HI 
Define c„ = Cq ■ exp(— ra'^) and 9c„ = {6^ G O | o"(i) > c„}. Then the parameter space 9 is 
divided into two sets: 

where 9^^ = {9 E Q \ cr(i) < c„} is the complement of 9c„. From Assumption [6l the 
reward term r„(6') is bounded. Furthermore, Assumption [7] indicates that the asymptotic 
behavior is not affected by the penalty term around 9o. Therefore the penalized maximum 
likelihood estimator over 9c„ is strongly consistent by Lemma I4.1.1[ If the maximum of 
the likelihood function over 9^ is very small, then the penalized maximum likelihood 
estimator over the whole parameter space 9 is strongly consistent. This takes care of 9c„ 
and from now on we consider the behavior of the penalized likelihood over 9^. 
Furthermore, we divide 9p into two sets: 



9^„ = $„U^'„, 



where 



^ {eeSZ\-^<l}. (4,2,2) 



For 6 G $„, all the scale parameters are very small. On the other hand, 6 G \I'„, the penalty 
l/r„(^) is very large and has large contribution relative to the likelihood. Therefore, 
intuitively, it seems that the maximum of the likelihood function over 9^ = $„ U is 
very small. We are going to prove that this is true. 
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By the argument used in IWaldl (Il949l ). in order to prove Theorem [T], it suffices to prove 
the following two equations. 

— {UlJiXr,eo)}-rM ' ^ 

™ {nr=i/(^^;^o)}-r.(^o) ' ^ ^ ^ 

4.2.2 Proof of (14.2.3!) for 

By the law of large numbers, we have 

1 " 

lim - Vlog/(X,;^o) = Eo[\og f{x; Oq)], a.s. 

n^oo n 

i=l 

Furthermore, by Assumption [6] and O we obtain 



lim -logr„(6'o) = 0. 



Therefore (I4.2.3P is implied by 



lim sup - ■ sup iVlog/(X,;^) +logr„(^) I < Eo[log /(x; ^o)] a.s. (4.2.5) 



Consequently, in order to prove f l4.2.3p . it suffices to prove (14.2.51) . 
From Assumption [8] and (14.2.21) , we have 



(Til) < (T^M) < , Oe<l>n, (4.2.6) 



where bn = bo ■ exp(— n'^) and the ffist inequality is derived from (12.1.11) . Because (T(i) < 
Cn = exp (— n'^), we obtain 

am < exp {n'^ - A ■ n"^) , 1 < m < M , 6 e $„. (4.2.7) 
Note that < d < d < 1, A>Oby Assumption [HI Define 

M 

J{d)= U [/^m - ^{(Tm), f^m + t'(o"m)]- (4.2.8) 

m=l 

Then the following lemma holds. 
Lemma 4.2.1. 

Prob ( sup i?„(J(^)) > M i.o.]=0. (4.2.9) 
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Proof: We prove Lemma 14.2.11 by using Lemma I4.1.3[ Let Wn = z/(exp {n'^ — A ■ n'^)). 
Because (14.1.61) and /3 > 2 by Assumption [5], the assumption fl4.1.2p of Lemma 14.1.31 is 
satisfied. From fHXTjl and KTM . we have 

sup Rn{J{9)) > M =^ sup — Wn, + Wn]) > 1 

Therefore, by Lemma [4.1.31 we obtain ( 14.2.9^ . □ 
We now state the following inequality, in order to bound the likelihood. 

Lemma 4.2.2. For 9 E $„, 

n 

Vlog/(X,;^) < Rn{J{e)) -log^ + Rn{J{ef) -logVoa^M). 
Proof: From Lemma [4.1.41 and f l2.1.ip . for 6 G we obtain 

n n ( M 

X] log /(^i! ^) = X] 1 amfmiXi; cr„ 

i=l i=l \^m=l 

n 

< 



< 



□ 



max log (Xj ; /i^ , cr^ ) 

' m=l,...,M 

V niax <^ max{l[^^_p(^^),/,^+^(^,„)](x) ■ log — , log?;ot^(M)} 

'^m=l,...,M [ Cr(i) 

= i?„(J(^^)) ■ log^ + i?„(J(^^)^) ■ log.;oc^{M). 
^(1) 

By Lemma 14.2.21 and Assumption [6l we obtain for G $n 

n 

Vlog f{Xf, 9) + logr„(^) < Rn{J{e)) • log ^ + J(0)^) • logt;oa(M) + logi?. 
Furthermore, from fl4.2.6p . we have for ^ G $n 

n / \ A 

V log f{X.i- 9) + log r„(e) < Rn{J{d)) ■ log — + J(e)^) ■ log "^^^^^ + log i? 

= (a ■ Rn{J{df) - Rn{J{0))) ■ log^d) - Rn{J{ef) ■ log 6„ + n log + logi?. 
Because bn = bo ■ e"""^, we obtain for 6* G 

n 

^log/(X,;e) + logr„(e) 

d 

< (a ■ Rn{J{ef) - i?„(J(^^))) ■ loga(i) + Rn{J{ef) ■ {n^ - logfoo) + nlogt;o + logi? 

< (a ■ Rn{J{ef) - i?n(J(^))) ■ loga(i) + n ■ (n^' + I - log6o| + logi^o) + log/?. 

(4.2.10) 
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i=l 



By Lemma [4.2. 11 we obtain 

(oo oo 
IJ Pi sup Rn{J{0)) < M 

= Prob I M n 1 1 sup Rn{Ji9)) < m] n I sup Rn{Ji9f) > n - m1 1 

/ oo oo \ 

< Prob I y Pi sup (^A ■ Rn{J{ef ) - i?„(J(0))) > A ■ (n - M) - M j < 1. 



(4.2.11) 



From 04.2. lip , the inequality supee$„ A ■ Rn{J{ef) - RniAO)) > A ■ (n - M) - M holds 
almost surely except for finite number of n. Therefore, we ignore the event sup^g^^ A ■ 
Rn{J{Of) - Rn{J{0)) < A-{n-M)-M. Because ct(i) < = cq ■ e""" and 04.2. 101) . for 
all sufficiently large n such that c„ < 1 and A ■ (n — M) — M > hold, we have 

sup |f]log/(Xi;^) + logr„(^) 



< (A • (n - M) - M) ■ (-n" + logco) +n ■ (n"* + I - logfool + logt^o) + logi? a.s. 



i=l 



From Assumption [HI the first term of the right hand side of the above inequality is the 
main term and diverges to — oo as n increases. Therefore, we obtain 04.2.51) : 



lim sup - ■ sup < log f{Xi; 6) + log r„(6') > 



-oo a.s. 



4.2.3 Proof of (I4.2.4|) for ^„ 

The outline of the proof of 04.2.4p is as follows. First, we partition the parameter space 
^„ into finite subsets '^n,x,s depending on the set of some parameter smaller than c„. 
Then, by using Lemma 14.1.31 we can show that the components with cr^ < c„ do not 
contribute to the likelihood more than M data points and the contributions are canceld 
out by the penalty term. Therefore, from Lemma 14.1.41 14.1.61 and 14.1.81 we obtain the 
following inequality for each '^n,.x,s- 

1 " 

limsup sup -y'log/(X,;6l) +logr„(0) < £'o[log /(x; 6lo)] - Aq, a.s. 
This leads to ^2^. 
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Setting up constants For A satisfying (14. 1.81) . let kq, Aq be real constants such that 
< 4ko < K , < 4Ao < A , — > max{croi, . . . ,(ToA/}- (4.2.12) 
Note that 4/to,4Ao also satisfy fl4.1.8p . Define 

B = — (4.2.13) 

If c"m > -B, then the density of the m-th component is almost flat and makes little 
contribution to the likelihood. In the following argument, we partition the parameter 
space according to this property. 

Because {c„} is decreasing to zero, by replacing cq by some c„ if necessary, we can 
assume without loss of generality that cq is sufficiently small to satisfy the following 
conditions, 

{vo/cof > e, 

Co < min{aoi, . . . ,aoA/}, 

3M ■ Mo ■ 2z/(co) ■ |log2fi:o| < Ao, 

3 ■ 2M ■ uq ■ ^{vq/cq) ■ log(fo/co) < Ao , 



where /3 = (/3 - l)//5 and 



^y) ^ 2 ■ ( ^ V ■ (t;o ■ (M + 1))'^ -(-Y. (4.2.15) 



.1^0/ \yJ 

Take Aq > sufficiently large such that 

Po((-oo, -Ao) U (Ao, oo)) ■ log ( ''°^'°^"||^''° ) < Ao. (4.2.16) 

2+C 

Let ^ = (— CX3, —Aq) U {Aq, oo) and An = Aq ■ nt>-^ as in Lemma [4.1.21 

Partitioning the parameter space Partition {1, . . . , M} into disjoint subsets J(f(j<cni 
^n<o-<co5 '^a>B, ^/i|>Ao and J^R. For any given JC<c„) 'X;„<(t<co) '^<j>b^ '^\^i\>Ao and 
we define a subset of by 

[171 G t>^<c„); 

Cn < O-m < Co, (m G ^„<a<co); 

dm > B,{me -XryB); 

Co < < 5, lyU^I > Ao, {m G =^;,|>Ao); 

Co < cr„ < B, \nm\ < Aq, {m G Jtu)} 
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The method of partitioning: of the parameter space is the same as in Section 4.3.2 of 



Tanaka and Takemural (120061 ) except for t^<c„. We will show that the contributions of 
the components in JC<c„ to the likelihood are canceled out by the penalty term. 

As above, it suffices to prove that for each choice of disjoint subsets =^<c„, -^„<<t<co5 
'^(T>B^ '^\^\>Ao and 

supee*„^nr=i/(^i;^) ■'^nW ^ 

We fix =X<c„, =X!„<a<co, ^>s, =^mI>^o and ^ from now on. 

Next we consider coverings of Qxr- The following lemma follows immediately from 
lemma 14.1.81 and compactness of 6,Xh- 

Lemma 4.2.3. Let ^{6,p{6)) denote the open ball with center 6 and radius p{9). Then 
9,jefl can he covered by a finite number of balls ^{6^^^^, p{6^j^^)), . . . , ^{0^^^, p{0^^^)) such 
that 

^o[log{/^,(x; ^Sl,p(^^Sl)) + no}] + Aq < Eo[logfix; Oo)] , = 1, . . . , 5) . 



Based on lemma 14.2.31 we partition '^n,j(r- Recall that we denote by the subvector 
of 6' e O consisting of the components in J^. Define a subset of '^n,je' by 

Then -^n,^ is covered by ^n,yr,i, • • • , ^n,jr,s : 

5 



s=l 



Again it suffices to prove that for each choice of .^<c„, ^„<cr<co) =>^>b, ^/i|>Ao; -^R 
and s 

lim ™ — ^ ^ = 0, a.s. 4.2.17 

We fix =X<c„, „<cr<co) '^a>Bi '^\\j\>A^^i and s from now on. By Assumption [6l [7] 
and the law of large numbers, (14.2. 17p is implied by 

1 " 

limsup- sup Vlog/(X,;^) + logr„(e) <Eo[log/(x;^^o)], a.s. (4.2.18) 
Therefore it suffices to prove (14.2.181) . 
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Bounding the penalized likelihood by six terms The outline of the rest of our 
proof is as follows. First, we bound the likelihood by four terms in Lemma r4.2.4[ Next, we 
bound one term of the four terms obtained in Lemma r4.2.4l by three terms in Lemma r4.2.5[ 
Finally, from Lemma 14.2.41 and Lemma 14.2. 5^ we bound the penalized likelihood by six 
terms in Lemma I4.2.6[ 
Define Ja<c„iO) as 

Ja<cA0) = U [/im - Z/(cr„), /i^ + Z/(0-„)]. (4.2.19) 

Let ^a>cn = {1; • • • ) \ J^a<c„- Then the following lemma holds. 
Lemma 4.2.4. For 9 G '^n,jt,s, 



i=l 

< ^ log < ^ amfmiXi] Om) + \ + Rn{Ja<Cn{(^)) " log — 
i=l [m6.X,>c„ J '^^ 

+Rn{J.<cM) -^og—. (4.2.20) 
Proof: For 6 G '^n,je',s C C 0^, from Lemma [4.1.61 the following inequalities hold. 

n 

j=l XieJa<Cn{0) Xim\J^<cn{S) 

< Rn{Ja<cn{^)) ■ log < max ( — 



l<m<M \ (T- 



+ ^ log < ^ amfm{Xi;6„i) + Ko 

< ^log< ^ (ymfm{Xi;em) + K0> - ^ log 

+i?„(J.<e„(^))-log^ 

n 



< ^log<j ^ <y^fm{Xi;e^) + Ko\ + Rn{Ja<Cn{^)) -log — 

+i?„(J.<,„(e))-log- ^ 



^(1) 

□ 
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Define Jc„<a<co{d) as 

Jc„<a<coiO) = [J [flm-J^{0-m),IJ'm + J^{0-m)]- 

For the first term of ( ]4.2.20p . the following lemma holds. 
Lemma 4.2.5. The following inequality holds for 9 G '^n,x,s- 



i=l 



1^0} < J2^og{f_r^{Xi;ej^,p{e.j^^))+4Ko} 

4^0 



+ RniO ■ log 



(0))-(-log2/s:o) 
+ Yl iog{f{Xf,e) + Ko}. 

(4.2.21) 

Proof: Let X>co = {!,•••, M}\{JC<c„l^X^ 

^n<^<co UX>b}- For a; ^ Jc„<<7<co(6'), /(a;; 6^) < f,r^^,^ix; Oj^t^^^J + kq holds. Therefore 

n 

5^1og{/(X,;^) + /to} < 5^ log{/(X,;^) + «:o} 

+ Yl {/.x.>.,(a;; ^.^.>.J + 2ko} 

n 

= {/^.>.o(a;; O^^^J + 2ko} 

1=1 



i=l 



(4.2.22) 



Consider the second term on the right-hand side. We have 

J2 [log{/(X.; 6) + no} - log {/^^,,,„(x; + 2«:o 

< Yl log{/(X,; ^) + «:o} - RniJc^<a<com ■ log2«:o . 
XieJc„<o-<co(^) 

This takes care of the third and the fourth term of 04.2.211) . Now consider the first term 
on the right-hand side of fl4.2.22p . Because of (14.2. 13p . we obtain 

n n 



i=l 



j=l 
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Note that M) = {x eM \ \x\ > Aq} and = =X;o<(t<b \ =>^mI>^o- ^ 4- have 
Therefore we obtain 

n 

XI log {/.^co<.<fl(^*; ^.^co<.<s) + 3/^0 } 
i=l 

= XI log {/^co<.<B(^i; ^^co<.<b) + 3/^o} + X log {/■^co<.<s(^*; ^^co<.<b) + 3/^0 } 

< X log{/^jX,;e,^J+4«:o}+ ^ log ^.^.^^^^ J + 3«:o} 

= Xlog{/.^i.(^-^.^i.)+4'^o} 

i=\ 

+ X [log{/^.„,.,,(X,;^^^^^^^J + 3«:o}-log{/^jX,;^^J+4«:o}" 

(4.2.23) 

Note that /jrcQ<^<B(a;; 6'jr^^^^<^) < fo/co from lemma 1^.1. 4[ Therefore 
The r.h.s of 04.2.231) 

n 

< X log {/^^^(X,;^^^-^) +4/40 } + X [log{^^o/co + 3Ko} -log4Ko] 



i=l X 

< 



Xlog{/^jX,; ^^«,p(^^J) + 4ko} + Rni^o) ■ log (^^^^^^^^) ■ 

This takes care of the first and the second term of f l4.2.2ip . □ 
By Lemma r4.2.4l and l4.2.5l the log likelihood function is bounded above as the following 
lemma. 
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Lemma 4.2.6. For 9 G '^n,x,s, 

n 

^log/(X,;^) + logr„(^) 

i=l 

n 
i=l 

, D 1 (vq/Cq + ?,Kq\ 

+i?„Ko)-log [-^^) 
<a<CQ (^))-(-log2«:o) 
+ Yl iog{f{Xf,9) + Ko} 

+ {Rn{J.<cAO)) ■ log — + logr„(e) 
We bound the six terms of (14.2 .241) in the following paragraphs. 



(4.2.24) 



The first term We begin by bounding the first term of fl4.2.24p . By lemma 11.1.81 and 
the strong law of large numbers, we have 

1 " 

limsup-^log{/;rH(^i;^jrK,p(^.x«)) +4/€o} < ^o[log /(x; ^o)] - 4Ao, a.s.(4.2.25) 



i=l 



The second term By ( 14.2.161) and the strong law of large numbers, we have 

limsup -Rn{0 ■ log ( + < Ao, a.s. (4.2.26) 



The third term and the fourth term The third term and the fourth term of (14.2.241) 
can be bounded from above as follows: 

limsup sup -Rn{Jc^<a<co{0)) ■ I log2fi;o| < 3M ■ uq ■ 2ij{cq) ■ \ log2Ko| < Aq, a.s. 

(4.2.27) 



limsup sup — log {/(Xj; ^^) + Ko} < Ao, a.s. 

XiSJcn<a<CQ (S) 



(4.2.28) 



The proofs of the ab ove inequalities are sirn il ar to the proofs of section 4.3.4 and 4.3.5 in 
the longer version of iTanaka and Takemural (120061 ). and are omitted. 
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The fifth term and the sixth term We now state the following lemma in order to 
bound the fifth term and sixth term of fl4.2.24p . 

Lemma 4.2.7. 

Probf sup Rn{J,<cM)>M 2.0. 1=0 (4.2.29) 

Proof: Let ty„ = z/(c„) = z/(exp(r;,'^)). Then (14.1.21) . the assumption of Lemma [4.1.31 is 
satisfied. From ( ]4.2.19p . we have 

sup Rn{Ja<c,,{0)) > M maxRn{[fI - Wn, fi + Wn]) > 1 

Therefore, by Lemma [4.1.3t we obtain (I4.2.29p . □ 

By Lemma 14.2.71 and the same argument in Section I4.2.2[ we ignore the event 
Rn{Ja<c,X^)) > M. Then we have for 9 G '^n,je,s uniformly 

Rn{Ja<cM) ■ log - + lRn{Ja<cM) " log — + logr„(^) 

f^O { o-(i) 
^0 , , rn{9) 



< M ■ log — + log -jj , a.s. 



From (14.2. 2p . we obtain for 9 G \E'„, 



Rn{J.<cAd)) ■ log - + {Rn{Ja<cM) " log — + logr„(^) 

< M ■ log — , a.s. 
Because the right hand side of the above inequality is constant, we have 



lim sup — ■ sup 



Rn{Ja<cM) ■ log - + \Rn{Ja<cM) " log — + \ogr^{9) 



< a.s. 

(4.2.30) 



The end of the proof Combining IKT^ , flO:^ , flO:^ , KTm , KTm , KTm , 

we obtain 

1 " 

limsup sup - V'log/(Xi; 6') + logr„(0) < £'o[log /(x; 6*0)] - Aq, a.s. 

Therefore we obtain (14.2.180 . 

This completes the proof of Theorem [H 
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4.3 Proof of Theorem H 

The outline of the proof of Theorem [2] is similar to the proof of Theorem [H 



Partitioning the parameter space Let d he a constant defined by Assumption [TOl 
Define c„ = cq ■ exp(— and 6c„ = {9 E Q \ cr(i) > c„}. The parameter space 6 is 
divided into two sets. 

Because the asymptotic behavior is not affected by the penalty term, the penalized 
maximum likelihood estimator over Gc„ is strongly consistent by Lemma HXTJ Therefore, 
it suffices to prove the following equation. 

Setting up constants We set up some constants as in section 14.2.31 

Let kq, Aq be real constants such that (14.2.121) holds. We can assume without loss 
of generality that Cq is sufficiently small to satisfy the equations (14.2. 14p . Take Aq > 
sufficiently large such that (14.2.161) holds. Let = (— oo, —Aq) U {Aq, oo) and An = 

2+C 

Aq ■ as in lemma H:.1.2[ Remember that (3 = {(3 — 1) / (3, and B and ^ are defined in 
(14.2. 13P and (14.2. 15p respectively. 



Partitioning the parameter space Partition {1, . . . , M} into disjoint subsets ty^<c„, 

'^n<o-<co) -^>B) '^\ti\>Ao For any given .>^<c„, =X;„<o-<co) '^a>B^ -^/j|>Ao ^-^d 

J^R, we define a subset of 6^ by 

> B,{me JC>b); 

Co < f^m < 5, lyUmI > ^0, {m e ^^|>Ao); 
Co < (7m < B, \flm\ < Aq, {iTL G X'r)} 

As above, it suffices to prove that for each choice of disjoint subsets J^a<cr,, ^„<(t<co5 
'^a>B, '^\^i\>Ao and J^R 

™ {]i:=J{x,,eQ)}.sn{eQ) 

We fix e^<c„, ^c„<<j<co, ^>iJ, ^/i|>Ao and ^ from now on. 

Next we consider coverings of 6 jr^ • The following lemma follows immediately from 
lemma 1^.1.81 and compactness of Qxr- 
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Lemma 4.3.1. Let ^{6,p{6)) denote the open ball with center 9 and radius p{9). Then 
Q,Xr can he covered by a finite number of balls ,^(6'^^, p(6'^^)), . . . , ^{O^Xr-, p{9']^^) such 
that 

E,[\og{f,r^{x- e%p{e%)) + K,}] + Ao < E,[\ogf{x- e,)] , (s = i, . . . , ^) . 

Based on lemma l¥.3.H we partition 9^ ^. Define a subset of 9^ ^ by 

Ql,.x,s ^{d^ 0f„,^ I O.^R e ^{d%p{d''$^)]. 

Then 9^^_^ is covered by Q^r^f^A^ • • • ' ^Zr^,s ■ 



5 

C 

S = l 



Again it suffices to prove tliat for eacli clioice of J^<c„, ^„<cr<co; '^a>B, ^/i|>yio5 
and s 

supegQC ^ ^{nr=i /(^*; ^)} ■ sn{o) ^ 

We fix JC<c„, ^„<a<co, -C>B, ^A»|>Ao> and s from now on. 

By Assumption [9l [TT] and law of large numbers, (14.3.ip is implied by 

limsup-- sup \ Y,^og f{Xi- 9) + log Sn{9)\ <Eo[\og fix; 9o)] a.s. (4.3.2) 

We prove fl4.3.2p in the following paragraphs. 

Bounding the penalized likelihood function by six terms Define Jcr<c„(^) and 

Jcn<a<co as 

Jc„<a<co{0) = U [/im - '^(o-m), /im + '^(O^m)]- (4.3.3) 

The following lemma can be proved by a method similar to the proof of Lemma 14.2.61 
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Lemma 4.3.2. For 9 E Qc„,jr,s' 



^log/(X,;^) + logs„(^) 



i=l 



i=l 



■ log 



4/to 

(e))-(-log2«:o) 
+i?„(J.<,„(e))-log^ 

+ (i?n(^<c„W) ■ log — + logs„(^^)) . (4.3.4) 
We bound the six terms of (14.3.41) in the following paragraphs. 

The first term We begin by bounding the first term of (I4.3.4p . By lemma 14.1.81 and 
the strong law of large numbers, we have 

1 " 

lim - Vlog{/^^(X,;^,x«,p(^,xJ) +4«:o} < Eo[\og f{x;9o)] - AXo, a.s. (4.3.5) 

i=l 

The second term By (14.2.161) and the strong law of large numbers, we have 

lim -i^„(^o)-log f ^^/y^^" ") <Ao, a.s. (4.3.6) 

The third term and the fourth term The third term and fourth term of (14.3.41) can 

be bounded from above as follows: 

lim sup sup -Rn{Jc„<a<co{(^)) ' I log2Ko| < 3M • Mo • 2i/(co) ■ I log2Ko| < Ao, a.s. 

c n 

(4.3.7) 



limsup sup — log {/(Xj; 6') + Kq} < Ao, a.s. 



(4.3.8) 



The proofs of th e above inequalities are similar to the proofs of section 4.3.4 and 4.3.5 in 
longer version of Tanaka and Takemura (12006 ). and are omitted. 
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The fifth term We now state the Lemma 14.3.31 in order to bound the fifth term of 
Lemma 4.3.3. 

Prob ( sup Rn{Ja<c„{9)) > M i.o.]=0 (4.3.9) 

Proof: Let Wn = i^{cn) = z/(exp(n^'*)). Then (14.1.21) . the assumption of Lemma [4.1.31 is 
satisfied. From (I4.3.3p . we have 

sup Rn{Ja<cA(^)) > M ^ maxi?„([/i -Wn,fJ^ + Wn]) > 1 

Therefore, by Lemma [4.1.31 we obtain (14.3.91) . □ 

By Lemma l4.3.3l and the same argument in Section H:.2.2[ we ignore the event Rn{J(7<c„ (^)) > 
M. Then we have 

sup Rn{Ja<c„iO)) • log — < M ■ log — tt.S. 

Therefore, we obtain for G ^ ^ 

lim sup -■i?„(J^<,„(^))-log^ = a.s. (4.3.10) 



The sixth term From Lemma 14.3.31 and the same argument in Section I4.2.2[ we have 

c 



for 9 E Q'^'j^ g uniformly 



i?n(J<x<c„(^))-log— + logs„(^) <log-^^ a.s. 



Furthermore, from Assumption [9] and [TOl we have 



M 



sn{e) _SnM JJ,„(a(„))<5^^-i.,.exp(n^ 



m=2 



Note that < ci < 1. Therefore we obtain for 6 G 9^ 



n— >oo n 



Mm -■\Rn{Ja<cM) -^og + logs„(^^) ^ =0 a.s. (4.3.11 



^(1) 



The end of the proof From fiO^j) . (i4X6|) . fiiXTD . KWi^ . (14.3.101) . 04.3.111) . and 
Lemma [4. 3. 2[ we have 

lim sup - i Vlog/(Xi;^) + logs„(0) I < ^o[log/(x;0o)] - Ao a.s. 

^^'^c„,X,s V i = l ) 

Therefore we obtain ( ]4.3.2p . 

This completes the proof of Theorem [21 
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5 Conclusion 



In location-scale mixture distributions, we have shown the consistency results for the 
two types of penalized maxiraum likelihood estimato r s. In Corollary [T], an open problem 
mentioned in iHathawavl fll985h . iMcLachlan and Peel fl2000h has been solved positively as 



follows: 



It is possible to let the lower bound b of the ratios of variances decrease to zero as 
the sample size n increases to infinity while maintaining consistency. 

If the rate of convergence of b is slower than exp(— ri'^) where d is a constant such 
that < d < 1, then the maximum likelihood estimator is strongly consistent under 
the constraint min^.m' -r^ > b. 



The assumptions for the penalties given in section ISTTl or section [3^ are not so restric- 
tive. Note that the penalty does not have to depend on the sample size n. For example, 
if we set fn{y) = r ■ y°'~^ and assume a > M + 1, then fn{y) satisfies the Assump- 
tion O and r„(6') = fnig^) satisfies the Assumption [7] and [HI The penalized likelihood 

(M) 

gn{0; X) corresponds to the posterior likelihood when we adopt a beta distribution as 
the prior of the minimum of the ratios of the scale parameters. Furthermore, if we set 
Sn{y) = ■ |/~('^+^) and assume > 0, then satisfies the Assumption M and [TUl 
and Sn{0) = Y[m=i ^nicTm) satisfies the Assumption [TTl The penalized likelihood hn{9; X) 
corresponds to the posterior likelihood when we adopt inverse gamma distributions as the 
priors of the scale parameters. 

From Theorem [1] and [21 we can easily show that the consistency of penalized likelihood 
estimator holds even when restrictions on either the location or scale parameters exist. If 
we know that the true parameter is in the restricted parameter space and the assumtions 
hold, then the consistency of the penalized maximum likelihood estimator holds by setting 
rn{(^) = or Sn{0) = for all n outside the restricted parameter space. For example, 
suppose one considers a uniform mixture distributions under the assumption that the 
data is non-negative. Theorem [1] and [2] are applicable to this model. 
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